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Abstract 

Nested multi-step stochastic correction offers a possibility to improve updat- 
ing algorithms for numerical simulations of lattice gauge theories with fermions. 
The corresponding generalisations of the two-step multi-boson (TSMB) algo- 
rithm as well as some applications with hybrid Monte Carlo (HMC) algorithms 
are considered. 

1 Introduction 

The main task in numerical Monte Carlo simulations of lattice gauge theories with 
fermions is to evaluate the (ratio of) fermion determinants appearing in the Boltz- 
mann weight for the gauge fields. The idea of the stochastic ("noisy") correction 
is to prepare a new proposal of the gauge configuration during updating by some 
approximation of the determinant ratio and accept or reject the change based on a 
stochastic estimator. This "stochastic correction step" takes care of the deviation of 
the approximate determinant ratio from the exact one. 

In multi-boson updating algorithms 2 it is natural to introduce a stochastic cor- 
rection step in order to correct for the deviations of the applied polynomial approxima- 
tions. In special cases it is possible to perform the correction by an iterative inverter 
0. More generally, the correction step can be based on successively better polynomial 
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approximations, as in the two-step multi-boson (TSMB) algorithm pjjj. A suitable way 
to obtain the necessary polynomial approximations is to use a recursive scheme pro- 
viding least-square optimisation Based on this stochastic correction scheme, the 
TSMB updating algorithm has been successfully applied in several numerical simula- 
tion projects both in super symmetric Yang Mills theory (see [Jj and references therein) 
and in QCD (see, for instance, [El El EHl HU)- 

In the present paper we generalise the idea of stochastic correction into a scheme 
of nested successive corrections based on polynomial approximations with successively 
increasing precision. (A similar "multi-level Metropolis" scheme has been proposed 
in Ref. |12l H3_.) In the next section we consider multi-step multi-boson algorithms. 
The last section is devoted to different possibilities for combining multi-step stochastic 
correction with variants of the hybrid Monte Carlo (HMC) updating algorithm |14j . 
In particular, optimised HMC algorithms based on mass preconditioning |12[ I15j and 
polynomial hybrid Monte Carlo (PHMC) algorithms ^S] are considered. 



2 Multi-step multi-boson algorithms 

The multi-step multi-boson (MSMB) algorithm is a generalisation of the TSMB up- 
dating algorithm. Therefore, let us briefly recapitulate the basics of TSMB. Let us 
assume that the determinant of the Hermitian fermion matrix Q = is positive, at 
least on most of the gauge configurations occurring with non-negligible weight in the 
path integral. In this case the sign of the determinant can either be neglected or taken 
into account by reweighting on an ensemble of configurations obtained by updating 
without the sign. (If the sign of the determinant plays an important role then there is 
a "sign problem" which cannot be dealt with by a straightforward Monte Carlo simu- 
lation procedure.) Without the sign the determinant factor in the Boltzmann weight 
of the gauge configurations is 

|detQ| 2 ° = (detQ 2 )" , (1) 

where in case of Nf mass-degenerate Dirac-fermion flavours we have a = \Nf. (Note 
that for a Majorana fermion a = 4.) Of course, for several fermion flavours with 
different masses there are several factors as in Q. Applying determinant break-up 
[T71 H%] one writes 

(detQ 2 ) = (det Q 2 ) . (2) 

with some positive integer n#. In what follows we always consider a single determinant 
factor with an effective power a: 
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a = i^-. (3) 
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If there are several such factors in the path integral then each of them can be separately 
taken into account in the same way. 

The basic ingredient of TSMB is a polynomial approximation 



P(x) ~ x- a , x e [e, A] (4) 

where the interval [e, A] covers the eigenvalue spectrum of Q 2 on gauge configurations 
having a non-negligible weight in the path integral. The determinant factor in the 
Boltzmann weight can then be taken into account with Liischer's multi-boson repre- 
sentation. Assuming that the roots of the polynomial P(x) occur in complex conjugate 
pairs, one can introduce the equivalent forms 

P(Q 2 ) = rof[[(Q ± ^? + v)\ = ro f[(Q - p*)(Q - Pj ) , (5) 

where n is the degree of P(x) and the roots are rj = (fij + ifj) 2 with pj = pj + ivj. 
With the help of complex boson (pseudofermion) fields <&j X one can write 
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/ exp -EE KQ ~ PjXQ ~ Pi)\v* *i* [ • ( 6 ) 

[ j=l xy J 

In the representation (JHJ) the complex boson fields &j x , j = 1,2, ... ,n carry the 
indices of the corresponding fermion fields. For instance, in QCD with Wilson-type 
fermions there are colour and Dirac-spinor indices. Since the multi-boson action in JBJ) 
is local, similarly to the gauge field action, one can apply the usual bosonic updating 
algorithms like Metropolis, heatbath or overrelaxation. In fact, the multi-boson action 
is Gaussian hence for the multi-boson fields a global heatbath update is also possible 
which creates, for a fixed gauge field, a statistically independent new set of boson fields. 

The polynomial approximation in (JIJ) is not exact. In order to obtain an exact 
updating algorithm one has to correct for its deviation from the function to be approx- 
imated. One can easily show that for small fermion masses in lattice units the (typical) 
smallest eigenvalue of Q 2 behaves as (am) 2 and for a fixed quality of approximation 
within the interval [e, A] the degree of the polynomial is growing as n oc e~ l l 2 oc (om)" 1 . 
In general, the polynomial approximation has to be precise enough in order that the 
deviations in expectation values be smaller than the statistical errors. In practical 
applications, for instance in QCD simulations, this would require very high degree 
polynomials with n of the order 10 3 -10 4 . (For numerical examples showing the conver- 
gence rate of the polynomial approximations see [0].) Performing numerical simulations 
with such a high n is practically impossible (and would be in any case completely in- 
effective). The way out is to perform the corrections stochastically. 



For improving the approximation in Q a second polynomial is introduced: 



Pi(x)P 2 {x) ~ x~ a , x€[e,A]. (7) 

The first polynomial P\(x) gives a crude approximation as in @ with P\(x) = P(x). 
The second polynomial P2(x) gives a good approximation according to 

P 2 (x) ~ [x^ifx)]- 1 . (8) 

During the updating process Pi is realized by multi-boson updates whereas P 2 is 
taken into account stochastically by a noisy correction step. For this, after preparing 
a new set of gauge fields [U'] from the old one [U] by local updates, one generates a 
Gaussian random vector having a distribution 

e -ri t P2(Q[U] 2 )v 

J[d V ]e-^P2(Q[u] 2 )v ' (9) 
and accepts the change of the gauge field [U] — > [U'] with probability 

min{l, A(rj;[U'} <- [U])} , (10) 

where 

A( m [U'\ <- [U]) = exp {-rfP 2 (Q[U'] 2 )n + rfP 2 (Q[U] 2 )r)} . (11) 

One can show |3] that this update procedure satisfies the detailed balance condition 
and hence creates the correct distribution of the gauge fields. (See the proof for the 
more general case of MSMB given below in (|20j) -([23 |) .) 

The Gaussian noise vector r\ can be obtained from 7/ distributed according to the 
simple Gaussian distribution 

„—n / *v' 

(12) 



J[dri'} e -^v' 
by setting it equal to 

r? = P 2 (Q[[/] 2 )- V . (13) 

In order to obtain the inverse square root on the right hand side of (|13|) . one can 
proceed with a polynomial approximation 



P 2 {x) ~P 2 (x)-2 , xe[e,X\. (14) 

Note that here the interval [e, A] can be chosen differently, usually with e < e, from the 
approximation interval [e, A] for P 2 . 

The polynomial approximation in (J7J) can only become exact in the limit when 
the degree n 2 of the second polynomial P 2 is infinite. Instead of investigating the 
dependence of expectation values on n 2 by performing several simulations, it is also 
possible to fix some high value of n 2 for the simulation and perform another correction 
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in the measurement of expectation values by still finer polynomials. This is done 
by reweighting the configurations. (A similar reweighting procedure is applied in the 
PHMC algorithm of Ref. JH]-) This measurement correction is based on a further 
polynomial approximation P' with polynomial degree n' which satisfies 



The interval [e',A] can be chosen by convenience, for instance, such that e' = 0, A = 
AmaD where A max is an absolute upper bound of the eigenvalues of Q 2 . (In case of e' = 
the approximation interval is strictly speaking (e', A]. An absolute upper bound for the 
eigenvalues of Q 2 exists because the commonly used fermion matrices are bounded 
from above.) In practice, instead of e' = 0, it is more effective to take e' > and 
determine the eigenvalues below e' and the corresponding correction factors exactly. 
For the evaluation of P' one can use re'-independent recursive relations [£], which can 
be stopped by observing the required precision of the result. After reweighting the 
expectation value of a quantity A is given by 



value on the gauge field sequence, which is obtained in the two-step process described 
before, and on a sequence of independent 77's. The expectation value with respect to 
the 77-sequence can be considered as a Monte Carlo updating process with the trivial 
action S v = 77^77. The length of the 77-sequence on a fixed gauge configuration can, in 
principle, be arbitrarily chosen. In practice it has to be optimised for obtaining the 
smallest possible errors with a given amount of computer time. 

The polynomial approximations in P|). (jHj), (|14[) and Q15|) can be obtained in a 
recursive scheme providing least-square optimisation Ej- Numerical methods to 
determine the polynomial coefficients can be based either on arbitrary precision arith- 
metics |19| or on discretisation of the approximation interval [20]. The expansion in 
appropriately defined orthogonal polynomials is an important ingredient, both in de- 
termining the polynomial coefficients and in the application of the polynomials of the 
squared fermion matrix Q 2 on a vector. 

Least-square optimisation corresponds to minimising the L2-norm of the devia- 
tion. An often used alternative is to minimise the norm which is equivalent to the 
minimisation of the maximal relative deviation. In general, the goal is to obtain the 
smallest possible deviation of the expectation values with the smallest possible poly- 
nomial degree. The experience with the least-square optimisation in TSMB has been 
rather satisfactory because it gives the best overall fit of the lattice action with a given 
polynomial degree. (For numerical examples comparing L2- with Loo-optimisation see 
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Ref. 0.) The often stated advantage of minimising the upper limit of the relative devi- 
ation of the lattice action is relativised by the fact that the deviation of the expectation 
values from the correct ones is in general a complicated function of the deviation in 
the lattice action. 

The multi-step multi-boson (MSMB) updating algorithm is a straightforward gen- 
eralisation of TSMB updating. Instead of the two-step approximation in Q we now 
consider a sequence of polynomial approximations of arbitrary length: 

P l {x)P 2 (x)...P k (x) ~x~ a , x€[e k ,X]. (17) 

Here the subsequent polynomials define approximations with increasing precision ac- 
cording to 

P i (x)~[x a P L (x)...P i - 1 (x)]- 1 , (i = 2,3,...,fc) . (18) 

The first polynomial Pi is realized during updating by local updates as in TSMB. 
The higher approximations Pi , . . . , Pj~ are implemented by a sequence of nested noisy 
correction steps as in (|9]l- (jlljl . The necessary Gaussian distributions of noise vectors 
can be obtained by appropriate polynomials, similarly to ifTljl : 

Pi(x) ~ Pi(x)-* , (i = 2,3,...,k) , x€[e fc) A]. (19) 

The proof of the detailed balance condition for MSMB goes essentially in the same 
way as for TSMB. The aim is to reproduce with the first % correction steps the canonical 
distribution of the gauge field 

w {i) [U]=e- s ° [u] {detP 1 [U]detP 2 [U]...detP i [U]y 1 , (i = 1, 2, . . . , jfe) , (20) 

where the short notation Pi[U] = Pi(Q[U] 2 ) is used and S g [U] denotes the action for 
the gauge field. Let us assume that detailed balance holds for the first (i — 1) steps, 
that is the transition probability Pa_i\([U'] <— [U]) satisfies 

P(i-l)([U'} - [U])e' s ^ {detP 1 [[/]...detp_ 1 [[/]}- 1 = 

P (i _l)([E7] - [U'})e- s °W {detP 1 [U'}...detP i ^ 1 [U'}}~ 1 . (21) 

The transition probability of the i'th step is a product of Pu-i) {[V] <— [U]) with the 
acceptance probability Pu\ a [\Xf'] <— [U]): 

P(^[U'} - [U]) = P^U'] - [U])P {€)a {[U>] - [U]) . (22) 

It is easy to show that if P(i) a ([U'] <— [U]) is defined according to (|9|)- (|llj) with P 2 
replaced by Pi then the acceptance probability satisfies 

P(i)a([U'] - [U]) {detp^]}- 1 = P (l)a {[U] +- [U']) {detP,[U']Y l . (23) 
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From this immediately follows that the transition probability of the i'th step 
P({\([U'] <— [C/]) satisfies the detailed balance condition 1)21(1 with (i — 1) replaced by 
(t). 

An alternative way to prove that the described procedure creates the correct distri- 
bution of the gauge fields is to consider the fields ij as additional pseudofermion fields 
in the Markov chain with the lattice action given by the exponent in @. 

The advantage of the multi-step scheme compared to the two-step one is that the 
lower approximations can be chosen to be less accurate and consequently have lower 
polynomial degrees and are faster to perform. The last approximations, which are 
very precise and need high polynomial degrees, can be done less frequently. The last 
polynomial P^ can already be chosen so precise that, for some given statistical error, 
the measurement correction with P' becomes unnecessary. 

An easy generalisation of the multi-step scheme described until now is to require 
the correct function to be approximated in (|17|) only in the last step and allow for 
functions easier to approximate in the previous steps. This means that (|18|) can be 
generalised, for instance, to 

Pi(x) ~ [{x + Pi ) a P^x)...P i - l [x)}- 1 , (i = l,2,...,fc) . (24) 

with positive pi and p^ = 0. This has a resemblance to the "mass preconditioning" 
as introduced for HMC algorithms in Ref. |12l I15j . The advantage of (|24[) is that for 
Pi > one can decrease the degree of the polynomial Pi(x) and at the same time, if 
Pi/ Pi— i is not much smaller than 1, the acceptance in the z'th correction step remains 
high enough. 

There are other multi-step approximation schemes conveivable: for instance, one 
can take Pi(x) ~ x~ a ' k , (i = 1, . . . , k) which corresponds to the determinant breakup 
in (J2J- Similarly, "mass preconditioning" can also be considered as a generalisation of 
determinant breakup. 

We performed several tests with the MSMB algorithms in some of the simulation 
points of Ref. with the Wilson fermion action for two flavours of quarks and the 
DBW2 gauge action for the colour gauge field. In particular, on an 8 3 • 16 lattice 
at P = 0.55, k = 0.188, p = (simulation point (c) in with a bare quark mass 
in lattice units am q ~ 0.015) a three-step algorithm was tuned for obtaining better 
performance. (Here p denotes the "twisted mass" which is actually set equal to zero 
in these runs.) In another test run on a 16 3 • 32 lattice we have chosen a point where 
a detailed simulation has been performed recently with both the TSMB and HMC 
algorithm |22| . namely at (5 = 0.74, k = 0.158, p = with a bare quark mass in lattice 
units am q ~ 0.024. In a three-step algorithm the following parameters were chosen: 
n B = 2, m = 60, n 2 = 200, n 2 = 300, n 3 = 800, n 3 = 900. (The degree of the 
polynomials Pi and Pi is denoted by rtj and n%, respectively.) The second correction 
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step was called after performing 10 update cycles involving the first correction. The 
integrated autocorrelation for the average plaquette in these test runs were typically 
around t'°* — 10 full update cycles including the second correction. 

The simulation costs in these runs turned out to be, even with a moderate effort 
put in parameter tuning, by about a factor of 1.5 lower than in the corresponding well- 
tuned TSMB runs. The gain comes from the lower cost of the first correction compared 
to the correction step in TSMB. The cost of the second correction does not contribute 
much to the full cost because it is done infrequently. For instance, on the 16 3 • 32 lattice 
the TSMB run had the parameters n B = 4, m = 34, n 2 = 720, n 2 = 740. (Note that 
the cost of the correction is mainly determined by the product ns(«2 + ^2) which is 
5840 in TSMB and only 1000 in the first correction of MSMB.) 

3 Multi-step correction for HMC 

The first (updating) step producing a new gauge field configuration can also be replaced 
by Hybrid Monte Carlo trajectories ^3]. In this step some approximation of the fermion 
determinant can be used and after a few trajectories one can perform a stochastic 
correction step. The rest within a multi-step correction scheme is the same as in 
MSMB updating. 

A possible application of multi-step stochastic corrections is to perform a HMC 
update with a mass-preconditioned fermion matrix which corresponds to pi > in 
Eq. Q24|) and correct for the exact determinant (that is, p\ = 0) stochastically. The 
polynomials for the stochastic corrections are defined in the same way as in 1)24(1 . 

Another possibility is to start by an update step as in polynomial hybrid Monte 
Carlo (PHMC) ^Hj- in order to generate the correct distribution of pseudofermion 
fields at the beginning of the trajectory one needs a polynomial as in ()19|) also for 
i = 1: 

~ Pi(x)-3 , x€[e,A]. (25) 

In order to avoid very high degree first polynomials P\(x), which would cause problems 
with rounding errors in the calculation of the fermionic force |2Bj . one should use 
determinant break-up (see Eq. ©). The ordering of the root factors in the expression 
of the fermionic force ^H] is best done according to the procedure proposed in jSJ. 
Again, the stochastic correction steps can be performed during the update according 
to the procedure described in Section [2j 

Besides decreasing the polynomial degrees in the PHMC update step, another ad- 
vantage of applying determinant breakup is that both magnitude and variance of the 

— 1/2 

quark force is decreased approximately proportional to n B 

In some test runs on 8 3 • 16 lattices the performance of the PHMC algorithm with 
stochastic correction turned out to be promisingly good. In particular, we performed 
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simulations with the parameters (3 = 0.55, k = 0.184, 0.186, 0.188, /x = corre- 
sponding to the points (a), (b) and (c) in Ref. with bare quark masses in lattice 
units am q ~ 0.071, 0.039, 0.015, respectively. The PHMC trajectories were created by 
applying the Sexton- Weingarten-Peardon integration scheme with multiple time scales 
|24| 125j . Gains up to factors of 5 were observed in comparison with the costs of the 
TSMB runs. The origin of this better performance is that the integrated autocorre- 
lations are shorter, whereas the costs for one update cycle are similar to TSMB (see 
Table 3 of ^^)- These numbers also show that in these points PHMC with stochastic 
correction is better than MSMB. 

4 Summary 

In summary, multi-step stochastic correction is a useful and flexible tool which can be 
implemented in both multi-bosonic and hybrid Monte Carlo update algorithms. In the 
present paper we reported on first tests with the multi-step multi-boson (MSMB) and 
stochastically corrected polynomial Monte Carlo algorithm which look promising. In 
our test runs on relatively small lattices and with moderately small quark masses the 
PHMC algorithm with stochastic correction is faster than MSMB. Of course, further 
tests on larger lattices and at smaller quark masses are necessary before applying these 
updating algorithms in large scale simulations. The relation between the cost factors 
of MSMB versus PHMC may also be different depending on the lattice volume and 
quark mass. 

Based on our experience with the TSMB algorithm, we expect the computational 
costs of our multi-step stochastic correction schemes to increase only slightly faster 
than linear with the number of lattice sites. This differs from the multi-level Metropolis 
scheme proposed in Ref. |121 113j where the volume dependence is quadratic. 

An important feature of both the MSMB and of the PHMC algorithm with multi- 
step stochastic correction is that they are applicable for odd numbers of flavours, too, 
provided that there is no sign problem with the fermion determinant. The same holds 
for the rational hybrid Monte Carlo (RHMC) algorithm where multi-step stochastic 
correction might also be useful. 

The main advantage of the stochastic correction in several steps compared to a 
single stochastic correction is that the costly last correction has to be done infrequently. 
This feature becomes increasingly more important for large lattices at small fermion 
masses where the cost of the last correction increases proportional to the inverse quark 
mass in lattice units. 
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